Today we will…
ggplot2)Artwork by Allison Horst
Which data follows a tidy data format?
Artwork by Allison Horst
Look at the file extension for the type of data file.
.csv : “comma-separated values”
Name, Age
Bob, 49
Joe, 40
.xls, .xlsx: Microsoft Excel spreadsheet
.csvreadxl package.txt: plain text
Using base R functions:
read.csv() is for reading in .csv files.
read.table() and read.delim() are for any data with “columns” (you specify the separator).
The tidyverse has some cleaned-up versions in the readr and readxl packages:
read_csv() is for comma-separated data.
read_tsv() is for tab-separated data.
read_table() is for white-space-separated data.
read_delim() is any data with “columns” (you specify the separator). The above are special cases.
read_excel() is specifically for dealing with Excel files.
Remember to load the readr and readxl packages first!
The Grammar of Graphics (GoG) is a principled way of specifying exactly how to create a particular graph from a given data set. It helps us to systematically design new graphs.
Think of a graph or a data visualization as a mapping…
…FROM variables in the data set (or statistics computed from the data)…
…TO visual attributes (or “aesthetics”) of marks (or “geometric elements”) on the page/screen.
ggplot2: elegant graphics for data analysis by Hadley Wickham
The grammar makes it easier for you to iteratively update a plot, changing a single feature at a time. The grammar is also useful because it suggests the high-level aspects of a plot that can be changed, giving you a framework to think about graphics, and hopefully shortening the distance from mind to paper. It also encourages the use of graphics customised to a particular problem, rather than relying on specific chart types.
data: dataframe containing variablesaes : aesthetic mappings (position, color, symbol, …)geom : geometric element (point, line, bar, box, …)stat : statistical variable transformation (identity, count, linear model, quantile, …)scale : scale transformation (log scale, color mapping, axes tick breaks, …)coord : Cartesian, polar, map projection, …facet : divide into subplots using a categorical variableggplot2We map variables from the data set to aesthetics on the chart.
aes() function.What aesthetics can we set?
Not an exhaustive list – see ggplot2 cheat sheet
Global Aesthetics
Local Aesthetics
Wee use a geom_xxx() function to represent data points, and the aesthetic properties to represent variables.
one variable
geom_density()geom_dotplot()geom_histogram()geom_boxplot()two variable
geom_point()geom_line()geom_density_2d()three variable
geom_contour()geom_raster()Not an exhaustive list – see ggplot2 cheat sheet
statA stat creates a new variable to plot (e.g., count, proportion).
Extracts subsets of data and places them in side-by-side graphics.
facet_grid(. ~ b): facet into columns based on bfacet_grid(a ~ .): facet into rows based on afacet_grid(a ~ b): facet into both rows and columnsfacet_wrap( ~ b): wrap facets into a rectangular layoutYou can set scales to let axis limits vary across facets:
facet_grid(y ~ x, scales = ______)
"free" – both x- and y-axis limits adjust to individual facets"free_x" – only x-axis limits adjust"free_y" – only y-axis limits adjustYou can set a labeller to adjust facet labels:
facet_grid(. ~ fl, labeller = label_both)facet_grid(. ~ fl, labeller = label_bquote(alpha ^ .(x)))facet_grid(. ~ fl, labeller = label_parsed)Position adjustments determine how to arrange geom’s that would otherwise occupy the same space.
position = 'dodge': Arrange elements side by side.position = 'fill': Stack elements on top of one another + normalize height.position = 'stack': Stack elements on top of one another.position = 'jitter": Add random noise to X & Y position of each element to avoid overplotting (see geom_jitter()). It is good practice to put each geom and aes on a new line.
Artwork by Allison Horst
Today we will…
What makes bad figures bad?
Edward R. Tufte is a better known critic of this style of visualization:
bad data.
Looking at pictures of data means looking at lines, shapes, and colors
Our visual system works in a way that makes some things easier for us to see than others
Graphics consist of:
Structure: boxplot, scatterplot, etc.
Aesthetics: features such as color, shape, and size that map other characteristics to structural features
Both the structure and aesthetics should help viewers interpret the information.
What sorts of relationships are inferred, and under what circumstances?
| Gestalt Hierarchy | Graphs |
|---|---|
| Enclosure | Facets |
| Connection | Lines |
| Proximitiy | White Space |
| Similarity | Color/Shape |
Implications for practice
Pre-Attentive Features are things that “jump out” in less than 250 ms
There is a hierarchy of features
Hue: shade of color (red, orange, yellow…)
Intensity: amount of color
Both color and hue are pre-attentive. Bigger contrast corresponds to faster detection.
Use color to your advantage
When choosing color schemes, we will want mappings from data to color that are not just numerically but also perceptually uniform
Distinguish between sequential scales and categorical scales
No more than 7 colors
Can use colorRampPalette() from the RColorBrewer package to produce larger palettes by interpolating existing ones
Use color gradient with only one hue for positive values
Use color gradient with two hues for positive and negative values. Gradient should go through a light, neutral color (white)
There are packages available for use that have color scheme options.
Some Examples:
There are packages such as RColorBrewer and dichromat that have color palettes which are aesthetically pleasing, and, in many cases, colorblind friendly.
You can also take a look at other ways to find nice color palettes.